Taming the ReLU with Parallel Dither in a Deep Neural Network
نویسنده
چکیده
Rectified Linear Units (ReLU) seem to have displaced traditional ‘smooth’ nonlinearities as activationfunction-du-jour in many – but not all deep neural network (DNN) applications. However, nobody seems to know why. In this article, we argue that ReLU are useful because they are ideal demodulators – this helps them perform fast abstract learning. However, this fast learning comes at the expense of serious nonlinear distortion products decoy features. We show that Parallel Dither acts to suppress the decoy features, preventing overfitting and leaving the true features cleanly demodulated for rapid, reliable learning.
منابع مشابه
Parallel Dither and Dropout for Regularising Deep Neural Networks
Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be ...
متن کاملDeep Recurrent Neural Networks for Sequential (1).pages
In analyzing of modern biological data, we are often dealing with ill-posed problems and missing data, mostly due to high dimensionality and multicollinearity of the dataset. In this paper, we have proposed a system based on matrix factorization (MF) and deep recurrent neural networks (DRNNs) for genotype imputation and phenotype sequences prediction. In order to model the long-term dependencie...
متن کاملUnderstanding Deep Neural Networks with Rectified Linear Units
In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give an algorithm to train a ReLU DNN with one hidden layer to global optimality with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) fo...
متن کاملError bounds for approximations with deep ReLU networks
We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz...
متن کاملDither is Better than Dropout for Regularising Deep Neural Networks
Regularisation of deep neural networks (DNN) during training is critical to performance. By far the most popular method is known as dropout. Here, cast through the prism of signal processing theory, we compare and contrast the regularisation effects of dropout with those of dither. We illustrate some serious inherent limitations of dropout and demonstrate that dither provides a far more effecti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1509.05173 شماره
صفحات -
تاریخ انتشار 2015